A Post-processing System to Yield Reduced Word Error Rates: Recognizer Output Voting Error Reduction (rover)
نویسنده
چکیده
This paper describes a system developed at NIST to produce a composite Automatic Speech Recognition (ASR) system output when the outputs of multiple ASR systems are available, and for which, in many cases, the composite ASR output has lower error rate than any of the individual systems. The system implements a "voting" or rescoring process to reconcile differences in ASR system outputs. We refer to this system as the NIST Recognizer Output Voting Error Reduction (ROVER) system. As additional knowledge sources are added to an ASR system, (e.g., acoustic and language models), error rates are typically decreased. This paper describes a post-recognition process which models the output generated by multiple ASR systems as independent knowledge sources that can be combined and used to generate an output with reduced error rate. To accomplish this, the outputs of multiple of ASR systems are combined into a single, minimal cost word transition network (WTN) via iterative applications of dynamic programming (DP) alignments. The resulting network is searched by an automatic rescoring or "voting" process that selects an output sequence with the lowest score.
منابع مشابه
ROVER Enhancement with Automatic Error Detection
In this paper, an approach is presented to improve the existing performance of the Recognizer Output Voting Error Reduction (ROVER) procedure used for speech decoders’ combination in automatic speech transcription. A contextual analysis is injected within the ROVER process to detect and eliminate erroneous words. This filtering is carried out through the combination of automatic error detection...
متن کاملEnhancement of the ROVER's Voting Scheme Using Pattern Matching
Combining the output of several speech decoders is considered to be one of the most efficient approaches to reducing the Word Error Rate (WER) in automatic speech transcription. The Recognizer Output Voting Error Reduction (ROVER) is a well known procedure for systems’ combination. However, this technique’s performance has reached a plateau due to the limitation of the current voting schemes. T...
متن کاملcROVER: Context-augmented Speech Recognizer based on Multi-Decoders' Output
The growing need for designing and implementing reliable voice-based human-machine interfaces has inspired intensive research work in the field of voice-enabled systems, and greater robustness and reliability are being sought for those systems. Speech recognition has become ubiquitous. Automated call centers, smart phones, dictation and transcription software are among the many systems currentl...
متن کاملUse of Multiple Front-ends and I-vector-based Speaker Adaptation for Robust Speech Recognition
Although state-of-the-art speech recognition systems perform well in controlled environments they work poorly in realistic acoustical conditions in reverberant environments. Here, we use multiple front-ends (conventional mel-filterbank, multitaper spectrum estimation-based mel filterbank, robust mel and compressive gammachirp filterbank, iterative deconvolution-based dereverberated mel-filterba...
متن کاملLV-ROVER: Lexicon Verified Recognizer Output Voting Error Reduction
Offline handwritten text line recognition is a hard task that requires both an efficient optical character recognizer and language model. Handwriting recognition state of the art methods are based on Long Short Term Memory (LSTM) recurrent neural networks (RNN) coupled with the use of linguistic knowledge. Most of the proposed approaches in the literature focus on improving one of the two compo...
متن کامل